智能论文笔记

Common human diseases prediction using machine learning based on survey data

Jabir Al Nahian , Abu Kaisar Mohammad Masum , Sheikh Abujar , Md. Jueal Mia

分类：机器学习

2022-09-22

在这个时代，作为医疗的主要重点，这一时刻已经到来了。尽管令人印象深刻，但已经开发出来检测疾病的多种技术。此时，有一些类型的疾病COVID-19，正常烟，偏头痛，肺病，心脏病，肾脏疾病，糖尿病，胃病，胃病，胃病，骨骼疾病，自闭症是非常常见的疾病。在此分析中，我们根据疾病的症状进行了分析疾病症状的预测。我们研究了一系列症状，并接受了人们的调查以完成任务。已经采用了几种分类算法来训练模型。此外，使用性能评估矩阵来衡量模型的性能。最后，我们发现零件分类器超过了其他分类器。

translated by 谷歌翻译

A Temporal Graph Neural Network for Cyber Attack Detection and Localization in Smart Grids

Seyed Hamed Haghshenas , Md Abul Hasnat , Mia Naeini

分类：机器学习

2022-12-07

This paper presents a Temporal Graph Neural Network (TGNN) framework for detection and localization of false data injection and ramp attacks on the system state in smart grids. Capturing the topological information of the system through the GNN framework along with the state measurements can improve the performance of the detection mechanism. The problem is formulated as a classification problem through a GNN with message passing mechanism to identify abnormal measurements. The residual block used in the aggregation process of message passing and the gated recurrent unit can lead to improved computational time and performance. The performance of the proposed model has been evaluated through extensive simulations of power system states and attack scenarios showing promising performance. The sensitivity of the model to intensity and location of the attacks and model's detection delay versus detection accuracy have also been evaluated.

translated by 谷歌翻译

Muscles in Action

Mia Chiquier , Carl Vondrick

分类：计算机视觉

2022-12-05

Small differences in a person's motion can engage drastically different muscles. While most visual representations of human activity are trained from video, people learn from multimodal experiences, including from the proprioception of their own muscles. We present a new visual perception task and dataset to model muscle activation in human activities from monocular video. Our Muscles in Action (MIA) dataset consists of 2 hours of synchronized video and surface electromyography (sEMG) data of subjects performing various exercises. Using this dataset, we learn visual representations that are predictive of muscle activation from monocular video. We present several models, including a transformer model, and measure their ability to generalize to new exercises and subjects. Putting muscles into computer vision systems will enable richer models of virtual humans, with applications in sports, fitness, and AR/VR.

translated by 谷歌翻译

Private Multiparty Perception for Navigation

Hui Lu , Mia Chiquier , Carl Vondrick

分类：机器学习 | 计算机视觉

2022-12-02

We introduce a framework for navigating through cluttered environments by connecting multiple cameras together while simultaneously preserving privacy. Occlusions and obstacles in large environments are often challenging situations for navigation agents because the environment is not fully observable from a single camera view. Given multiple camera views of an environment, our approach learns to produce a multiview scene representation that can only be used for navigation, provably preventing one party from inferring anything beyond the output task. On a new navigation dataset that we will publicly release, experiments show that private multiparty representations allow navigation through complex scenes and around obstacles while jointly preserving privacy. Our approach scales to an arbitrary number of camera viewpoints. We believe developing visual representations that preserve privacy is increasingly important for many applications such as navigation.

translated by 谷歌翻译

Building the Intent Landscape of Real-World Conversational Corpora with Extractive Question-Answering Transformers

Jean-Philippe Corbeil , Mia Taige Li , Hadi Abdi Ghavidel

分类：自然语言处理 | 人工智能 | 机器学习

2022-08-26

对于具有客户服务的公司，其对话数据中的映射意图对于基于自然语言理解（NLU）构建应用程序至关重要。但是，尚无既定的自动化技术来收集嘈杂的在线聊天或语音成绩单中的意图。简单的聚类方法不适合意图对话。为了解决这项意图景观任务，我们提出了一条无监督的管道，从现实世界对话中提取意图和分类。我们的管道地雷意向跨候选者具有提取性问题的电气模型，并利用句子的嵌入来应用低级密度聚类，然后是顶级分层聚类。我们的结果表明，在Squad2数据集上微调的Electra大型模型的概括能力以了解对话。有了正确的提示问题，该模型实现了对意图的语言验证率超过85％。我们此外，从多道数据集中重建了五个域的意图方案，平均召回率为94.3％。

translated by 谷歌翻译

HTML版本

Data Science and Machine Learning in Education

Gabriele Benelli , Thomas Y. Chen , Javier Duarte , Matthew Feickert , Matthew Graham , Lindsey Gray , Dan Hackett , Phil Harris , Shih-Chieh Hsu , Gregor Kasieczka

分类：机器学习

2022-07-19

鉴于HEP研究的核心，数据科学（DS）和机器学习（ML）在高能量物理学（HEP）中的作用增长良好和相关。此外，利用物理数据固有的对称性激发了物理信息的ML作为计算机科学研究的充满活力的子场。 HEP研究人员从广泛使用的材料中受益匪浅，可用于教育，培训和劳动力开发。他们还为这些材料做出了贡献，并为DS/ML相关的字段提供软件。物理部门越来越多地在DS，ML和物理学的交集上提供课程，通常使用HEP研究人员开发的课程，并涉及HEP中使用的开放软件和数据。在这份白皮书中，我们探讨了HEP研究与DS/ML教育之间的协同作用，讨论了此交叉路口的机会和挑战，并提出了将是互惠互利的社区活动。

translated by 谷歌翻译

MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering

Zhucheng Tu , Sarguna Janani Padmanabhan

分类：自然语言处理

2022-07-05

我们描述了我们的两阶段系统用于多语言信息访问（MIA）2022关于跨语义开放回程问题的共享任务。第一阶段包括多种语通过的检索，并具有混合密集且稀疏的检索策略。第二阶段由读者组成，该读者从第一阶段返回的顶级段落中输出答案。我们展示了使用实体表示，稀疏检索信号来帮助稠密检索的功效以及fusion-In-indecoder。在开发集中，我们在XOR-TYDI QA上获得43.46 F1和MKQA的21.99 F1，平均F1分数为32.73。在测试集中，我们在XOR-TYDI QA上获得40.93 F1和MKQA上的22.29 F1，平均F1分数为31.61。在开发和测试集上，我们在官方基线上提高了4个F1点。

translated by 谷歌翻译

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages

Akari Asai , Shayne Longpre , Jungo Kasai , Chia-Hsuan Lee , Rui Zhang , Junjie Hu , Ikuya Yamada , Jonathan H. Clark , Eunsol Choi

分类：自然语言处理

2022-07-02

我们介绍了关于多语言信息访问（MIA）2022共享任务的研讨会的结果，评估了16种类型上多样性的语言中的跨语性开放回程答案（QA）系统。在此任务中，我们在14种类型上多样化的语言中调整了两个大规模的跨语性开放式质疑QA数据集，并使用了2种代表性不足的语言中的新注释的开放式QA数据：Tagalog和Tamil。四个团队提交了他们的系统。利用迭代开采的最佳系统是不同的负面示例和较大的预审慎模型达到32.2 F1，表现优于我们的基线4.5分。第二最佳系统使用实体感知的上下文化表示文档检索，并在泰米尔语（20.8 F1）方面取得了重大改进，而其他大多数系统的得分几乎为零。

translated by 谷歌翻译

Building Machine Translation Systems for the Next Thousand Languages

Ankur Bapna , Isaac Caswell , Julia Kreutzer , Orhan Firat , Daan van Esch , Aditya Siddhant , Mengmeng Niu , Pallavi Baljekar , Xavier Garcia , Wolfgang Macherey

分类：自然语言处理 | 人工智能 | 机器学习

2022-05-09

在本文中，我们分享了我们努力建立能够翻译一千多种语言的实用机器翻译（MT）系统的发现。我们在三个研究领域中描述了结果：（i）通过利用半监督预训练的语言识别和开发数据驱动的过滤技术来构建1500多种语言的清洁，网挖数据集；（ii）通过利用大规模的多语言模型来开发用于服务不足的语言的实用MT模型，该模型训练了有监督的并行数据，以使用100多种高资源语言和单语言数据集，以增加1000多种语言；（iii）研究这些语言的评估指标的局限性，并对我们MT模型的输出进行定性分析，突出显示了这些类型模型的几种频繁误差模式。我们希望我们的工作为旨在为当前研究的语言构建MT系统的从业者提供有用的见解，并突出显示可以补充Data-Sparse设置中大量多语言模型的弱点的研究方向。

translated by 谷歌翻译

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Aditya Siddhant , Ankur Bapna , Orhan Firat , Yuan Cao , Mia Xu Chen , Isaac Caswell , Xavier Garcia

分类：自然语言处理 | 机器学习

2022-01-09

在所有人类语言对之间实现通用翻译是机器翻译的圣杯（MT）研究。虽然最近在大量的多语言MT中的进展是达到这一目标的一步，但它变得明显，即简单地通过在更加平行数据上训练扩展多语言MT系统是不可编译的，因为用于低资源和非英语的标记数据的可用性 - 姓氏对禁止有限。为此，我们展示了一种务实的方法，可以使用监督和自我监督目标的混合来构建涵盖数百种语言的多语种MT模型，具体取决于不同语言对的数据可用性。我们展示这两种训练范例之间的协同作用使模型能够在零资源设置中产生高质量的翻译，甚至超过监控的用于中资和中资和中资质。我们开展广泛的实验，了解多语言监督，域错配和平行和单机数据量的效果，以了解我们自我监督的多语言模型的质量。为了展示方法的可扩展性，我们培训具有200多种语言的模型，并在几个先前研究的语言上展示了对零资源翻译的高性能。我们希望我们的调查结果将成为踏脚石，以便为下一千种语言进行翻译。

translated by 谷歌翻译